Gravitational wave parameter estimation with compressed likelihood evaluations 
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One of the main bottlenecks in gravitational wave (GW) astronomy is the high cost of performing 
parameter estimation and GW searches on the fly. We propose a novel technique based on Reduced 
Order Quadratures (ROQs), an application and data-specific quadrature rule, to perform fast and 
accurate likelihood evaluations. These are the dominant cost in Markov chain Monte Carlo (MCMC) 
algorithms, which are widely employed in parameter estimation studies, and so ROQs offer a new 
way to accelerate GW parameter estimation. We illustrate our approach using a four dimensional 
GW burst model embedded in noise. We build an ROQ for this model, and perform four dimensional 
MCMC searches with both the standard and ROQs quadrature rules, showing that, for this model, 
the ROQ approach is around 25 times faster than the standard approach with essentially no loss 
of accuracy. The speed-up from using ROQs is expected to increase for more complex GW signal 
models and therefore has significant potential to accelerate parameter estimation of GW sources 
such as compact binary coalescences. 



I. MOTIVATION AND CONTEXT 

Computing correlations between data and models de- 
scribed by large dimensional parameter spaces is an im- 
portant aspect of many scientific disciplines. Obtain- 
ing estimates of the parameters of observed signals is 
crucial to extract the most from multi-billion dollar ex- 
periments such as gravitational wave (GW) detectors 
(i.e., advanced LIGO, advanced Virgo, Indigo, and KA- 
GRA) ^0]. However, carrying out parameter estimation 
on large dimensional parameter spaces can be computa- 
tionally expensive. Costs grow further if several different 
models or alternative theories of gravity (see, for exam- 
ple, [SHU] and [I3HIS]j respectively) are used to analyse 
the data as a prelude to Bayesian model selection. It is 
therefore of great importance to develop efficient meth- 
ods for analysing the data to ensure that all the desired 
science can be extracted from the data in a reasonable 
time. 

One of the primary methods for computing the prob- 
ability distribution for the parameters of a given signal 
in a data set is Markov chain Monte Carlo (MCMC). 
This requires evaluating the posterior probability of the 
model parameters throughout parameter space. When 
the likelihood and hence posterior probability is expen- 
sive to evaluate, MCMC algorithms can become compu- 
tationally prohibitive. In such cases, approximate meth- 
ods such as the Fisher matrix are widely used because 
they are significantly cheaper than a full Bayesian analy- 
sis. Several rather optimistic assumptions, however, such 
as high signal-to-noise ratios are often not satisfied in 
practice. Recently, other sampling approaches [16] for 
computing the maximum likelihood estimator have been 
proposed for low signal-to-noise scenarios. 

An alternative way to improve the speed of MCMC 
algorithms is to reduce the cost of evaluating the likeli- 



hood at each parameter space point. This strategy has 
motivated work on directly interpolating the likelihood 
[TTH^H] and training a neural network to "learn" likeli- 
hood data 21|. At least in the case of direct interpolation 
there could be technical obstacles for likelihoods which 
require waveforms with many cycles and/or higher di- 
mensionality [181 I20j . In this paper we describe a novel 
technique for fast, accurate calculations of correlations 
between data and modeled waveforms, fine tuned for ap- 
plications such as MCMC. The approach is based on Re- 
duced Order Modeling (ROM) and, as such, aims to sig- 
nificantly reduce the problem's dimensionality by exploit- 
ing redundancies. The result is a compressed represen- 
tation of the likelihood thereby reducing the cost of each 
evaluation. Generalizations to higher dimensions and/or 
many cycles are readily handled within the method's ex- 
isting framework j^ . 

Within typical GW physics applications, the number 
of required correlations quickly grows with the number 
p of physical parameters and the number of GW cycles. 
For example, the number of search templates scales as 
-- (1 - MM)-P/2 [23], where MM is the minimal match 
of the catalog. For a compact binary coalescence with 
p = 8 intrinsic parameters lasting for 10^ cycles we could 
need up to ~ lO""^ templates for a fully coherent search 
|24) . In light of these scalings there is an obvious need 
for reducing the cost of each correlation. 

Correlation costs typically scale with the length N of 
the data, which depends on both the observation time 
and sampling rate. Furthermore, standard fast converg- 
ing numerical integration rules for smooth functions, such 
as Gaussian quadratures, lose their fast convergence in 
the presence of noisy (non-smooth) data. In this paper 
we show how integrals with noisy data can be computed 
with a cost not set by the Nyquist sampling rate or obser- 
vation time [35] , but rather the "information content" of 



the gravitational waveforms themselves. The integration 
converges fast, typically exponentially, with the number 
of sparse data samples m drawn from the full data set, 
even in the presence of noise. The overall likelihood cost 
is thereby reduced to m <S^ N. 

Our approach for speeding up correlation computa- 
tions is based on a recently proposed Reduced Or- 
der Quadrature (ROQ) for parametrized functions ^2] . 
ROQ combines dimensional reduction with the Empirical 
Interpolation Method (EIM) [321 HZ] to produce a nearly 
optimal quadrature rule for parametrized systems. To 
do so, it exploits smooth dependence with respect to pa- 
rameter variation, when available, to achieve very fast 
convergence with the number of data samples. Even in 
the absence of noise, in many cases ROQs outperform the 
best known quadrature rule (Gaussian quadratures) for 
generic smooth functions [22] • The key aspect of this ap- 
parent super-optimality is to leverage information about 
the space of functions in which we are interested. 

In the context of GW parameter estimation, the use of 
ROQs can significantly improve the performance of exist- 
ing numerical algorithms by reducing the computational 
cost of computing a waveform overlap (correlation) with 
the data. Here we illustrate this application of ROQs to 
GW parameter estimation using a simple model of a sine- 
Gaussian GW burst waveform. This model is chosen as 
a toy one to illustrate the method. Although such wave- 
forms have been used in GW searches (see, for example, 
[55]), the cost of their likelihood evaluations is not sig- 
nificant, so we are not suggesting that this application is 
one for which ROQs are required. However, we demon- 
strate that even for such a simple model the speed-up 
from ROQs is significant and we expect that comparable 
or greater speed-ups will be possible for more complex 
GW signal models [22] . 

This paper is organised as follows. In section jll] we 
present an overview of the proposed approach. In sec- 
tions HI IV and[V]we introduce the building blocks of the 
method; namely. Reduced Order Modelling, the Empir- 
ical Interpolation Method and Reduced Order Quadra- 
tures, as well as the GW burst model. Finally, in Sec. |VI| 
we apply the ROQ approach to perform a MCMC search 
using the burst model, explicitly showing that ROQ can 
considerably speed up MCMC computations. Among the 
new aspects that we address compared to [52] are how to 
deal with the arrival time of the GW signal, and the ap- 
plication of the technique to noisy data. In Appendices 
\K\ and IB] we summarise the greedy approach for gener- 
ating a Reduced Basis, and the Empirical Interpolation 
Method, respectively. 



II. METHODOLOGY 

In this paper we are interested in improving the per- 
formance of GW parameter estimation by using ROQs. 
We assume that the detected data stream is given by 
s(i) = h{t;X) + n{t), where h{t;X) is the GW signal 



that we want to characterise, which depends on a multi- 
dimensional set of source parameters A, and n{t) is in- 
strumental noise. 

In the context of Bayesian parameter estimation the 
posterior probability distribution function (PDF) pro- 
vides complete information about the parameters of the 
signal: 



p{\\s):=Cp{X)P{s\\) 



(1) 



Here p (A) is the prior probability density, C an overall 
normalization constant, and P{s\\) is the likelihood that 
the true parameter values are given by a particular A, or 
in other words, the likelihood that the signal is present 
in the data stream. For Gaussian, stationary noise the 
likelihood is 



P(s|A)cx exp(-xV2) 



(2) 



where 



X^ := {n\n) = (s(t) - h (t; A) \s{t) - h (i; A)) (3) 

is the weighted norm of the noise realization n(i), defined 
by the weighted inner product (see e.g. [24] ) 



Ju.. Snif) 



(4) 



with * denoting complex conjugation and Sn{f) the 
power spectral density of the detector's noise. Owing 
to the form of S'„(/) in GW physics, the lower limit of 
integration in Eq. Q is sometimes replaced by /min > 0. 

When dealing with high dimensional problems, the 
process of mapping the likelihood (or the posterior) sur- 
face can become very expensive. MCMC algorithms are a 
useful technique for searching through such large spaces, 
by following a random walk in parameter space, with the 
probability of a sample being chosen at any point be- 
ing proportional to the posterior probability. However, 
since a MCMC search depends on the number of sampling 
points, as well as the dimensionality of the problem, it 
can still be a very expensive algorithm and in many cases 
prohibitively so. 

This paper proposes application and data-specific 
quadrature rules for scenarios such as GW parameter es- 
timation, where correlations between noisy data and a 
family of functions have to be repeatedly evaluated. The 
quadrature rules employed here are a variation of the 
ROQ introduced in Ref. [22| for the case n{t) — 0, and 
their construction follows several layers of dimensional 
reduction that are explained in the different sections of 
this paper, namely: 

1. Construct a basis for the space of waveforms 
of interest. Offline stage. 

Described in Sec. |III| A Reduced Basis-greedy ap- 
proach has several advantages, including an ap- 
proximation to the most relevant points in parame- 



ter space, but the proposed ROQ can use any choice 
of a "good" basis. 

2. Identify the empirical interpolation points 
associated with the above basis. Offline 
stage. 



Described in Sec. |IV| This step provides, through 
a greedy approach, the set of most relevant points 
in the physical diniension(s) , and a nearly optimal 
global interpolant associated with the basis con- 
structed in Step 1. These EIM nodes are to be 
used as integration points in the ROQ rule. 

3. Given any stream of data, construct the 
weights of the ROQ. Startup stage. 

Described in Sec. |VJ These weights are linear com- 
binations of correlations between the data and the 
basis elements of Step 1. 

4. Fast likelihood evaluations. Online stage. 

Described in Sec.|Vl] The ROQ uses the nodes com- 
puted in Step 2 and the weights computed in Step 
3 to perform fast and accurate evaluations of over- 
laps between the data and any waveform within the 
model. 

Section |VI| discusses the results of putting the above 
pieces together into MCMC simulations for parameter es- 
timation of mock data corresponding to the burst model 
family of waveforms described below in Eq. ([7|. From 
these simulations, in particular, we quantify the signifi- 
cant speed-ups that are obtained even for such a simple 
GW model when using the proposed ROQ. 



III. REDUCED ORDER MODELING 

Roughly speaking, ROM deals with data which can be 
represented by fewer degrees of freedom than those of 
the full problem with or without loss of accuracy. For 
a given problem there are many available methods for 
revealing a reduced representation. Classical methods 
such as Principal Component Analysis, Proper Orthogo- 
nal or Singular Value Decompositions (SVD) [29], which 
are related to each other, were introduced as early as the 
1800's (see [3D] for a review of their history) and reveal 
low-rank approximations within existing data. Other ap- 
proaches such as Reduced Basis (RB) (see, as a sample, 
[5TH55] or [32] for a recent review), are specifically de- 
signed for parametrized problems whose solution is ex- 
pensive to evaluate but also carry advantages when deal- 
ing with "big data" problems (e.g. if the data cannot fit 
into memory or the SVD cost becomes prohibitive). 

Both RB-greedy and SVD are projection-based ROM 
algorithms. If the waveforms are known at the training 
points 



with \i some parametrization of the samples, a 
projection-based method identifies a basis {ci}™]^ such 
that 



H-;X) 



;c,(A)e,(-), forAer 



(5) 



with 771 < M and where the coefficients Ci are given by 
Eq. ( A5 1 (see Appendix [A] for more details) . If the prob- 
lem is amenable to ROM, then m < M or even m ^ M. 
To be more concrete, in the GW case A would represent 
the (intrinsic and/or extrinsic) parameters of the prob- 
lem, and M the number of available parameter samples; 
say, the number of waveforms in a catalog or even the con- 
tinuum, Af — ^ c». A generic waveform with associated 
parameter A would be a function of time or frequency, 

h^h{t;X) or h^h{f;X). 

In what follows, we will refer to A as the parameter di- 
mension and / or t as the physical one. 



A. Generating a basis 

Suppose for any A the GW template h{-;\) has an 
accurate approximation of the form ([5| in some basis 
{^i}i^i- Recent work [JUl - HH] has shown that for fixed 
but arbitrary physical and parameter ranges, a small 
number of basis functions is sufficient to accurately rep- 
resent any waveform of the same physical model in that 
range. Furthermore, when the basis is generated through 
a RB-greedy algorithm (described in Appendix IA| , the 
approximation error is guaranteed to yield a nearly op- 
timal solution of the so-called n-width approximation 
problem [SJ US] . In the cases of interest this means ex- 
ponential convergence of the representation error defined 
below in Eq. (rol with respect to the number of basis 
functions, resulting in a very compact basis. In addition, 
the number of basis elements often exhibits negligible in- 
crease as the dimensionality of the problem grows 42J . 

Of the basis set {ei(-)}™ j^ we require 77i to be small 
and the approximation to satisfy 



max mm 



/7(-;A)-^c,(A)e,(. 



<e, 



(6) 



r:={AJ 



M 



where e is a user defined bound for the error (in our cases, 
typically ~ 10~^^, see for example Fig. [2]), the coefhcients 
{ci} are chosen so as to optimize the approximant (see 
Appendix [A]) , and the largest error in the parameter re- 
gion of interest is taken. That is, a„i quantifies the error 
of the "worst best" approximation by the basis. 

Many possible basis choices, including traditional 
ones such as Chebyshev polynomials or Fourier basis, 
could satisfy the above required criteria. In practice, 
application-specific bases usually provide better accuracy 
for a given 777 and also lead to a well-conditioned global 



interpolation procedure, as described in Sec. IV 



We have mentioned the RB-greedy algorithm as one 
approach to generate a good basis. For definiteness, in 
the simulations of this paper our basis is constructed with 
such an algorithm (described in Appendix [A|). Our pro- 
posed ROQ rule is, however, directly applicable to any 
projection-based ROM basis, including SVD [HHHIIJT] . 



B. An example of RB: burst waveforms 

In order to illustrate our approach, we consider a four 
parameter GW-burst waveform given by the following 
sine-Gaussian waveform: 



h{t; A) := Ae-(*-*'^)'/(2a^) sin(27r/o(i - Q) 



(7) 



where A, /q and a are the amplitude, frequency and 
width of the waveform respectively, and where tc is the 
arrival time of the GW-burst signal and t E [—00,00]. 
The Fourier transform (FT) of this waveform is given by 



hif,t,;X)^e''^^'^hif;\) 



(8) 



where h{f; A) is the FT of the GW-burst at t^ = 0: 
h{f;X) ^i2AaV2nsmh{4TT^a^faf)e-^''^°'^^^o+f^K (9) 

This waveform family is described by four free parameters 
A = {a, fQ,tc,A). We will build the RBs over just two 
parameters (a, /o), since the others are extrinsic and can 
be handled differently, as discussed in Sec. |VC[ 

We build the RB for these burst waveforms over the 
parameter space defined by 



= [.02, 2] sec , /o = [.01,l]Hz, 



(10) 



sampled with 180 equally spaced training points in each 
dimension. Unless otherwise stated, the range given in 



Eq. ( 10 1 will be the default one for all experiments and 



the units will always be in seconds and Hertz. To repre- 
sent any burst waveform drawn from the above range we 
take 



T = 32 sec 



/s = 64Hz, 



(11) 



to be our default observation time and sampling rate. 
Similarly, for the injected signals our default parameters 
will be 



1 



/o = 0.25 



tr 



0.1. 



(12) 



We will also present results for a two parameter model in 
which tc is fixed at tc = and where A is chosen to give a 
specified signal-to-noise ratio (SNR), p, with p^ = {h\h) 
for the inner product defined by Eq. (Hi). 

Fig. [1] shows the 54 points, out of 180 x 180 samples, 
selected by the greedy algorithm to build the RBs, and 
the order in which the first 10 points are picked, while 



Fig.[2]shows the representation error of the training set as 
a function of the number of RB elements. Consistent with 
previous experience, we have found that if the training 
set is dense enough (and for this model, one of 180 x 
180 samples is) then any waveform not present in the 
training set yields similarly small representation errors 
by the basis; see for example [13 132] for more details. 

Greedy points for sine-gaussian waveforms 

(2) (7) (3) (6) (4) (9) (5) 
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FIG. 1: Points selected by the greedy algorithm for the model 
family of burst waveforms ([7| with the default range ( 10 1 for 
its parameters. The first 10 greedy points are represented 
with markers indicating the order of selection, with parenthe- 
sis serving as a visual aid. The inset figure shows with black 
asterisks all the 54 selections, out of 180 x 180 samples, chosen 
by the greedy algorithm. 
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o> 10 




FIG. 2: Approximation error as a function of the number of 
basis generated with a greedy algorithm from the previous 
figure. The error am, defined by Eq. m, is computed as the 
maximum within the parameter region given in Eq. ( 10 1. 



So far we have described the generation of basis ele- 



ments. The next step is the prediction (as opposed to 
projection) of waveforms from a sparse set of well chosen 
frequency samples. 



IV. EMPIRICAL INTERPOLATION 

Within a projection-based approximation one has 

m 

/i(x) « ^Cjej(a;), (13) 



where the coefficients Ci are given by Eq. ( A5 ) . Comput- 
ing the projection coefficients Ci requires full knowledge 
of the function h (see Appendix IA] for more details). 

Given a basis and partial sampling of h, in the in- 
terpolation problem we are interested in predicting the 
underlying function. In what follows, we will first re- 
view the classical interpolation problem, using a polyno- 
mial basis before discussing empirical interpolation with 
application-specific basis functions, and finish this sec- 
tion with an example for burst GWs. 



A. Classical interpolation w^ith polynomials 

Classically the interpolation problem for a function 
h{x) is the following. Given a set of m nodes {xi}, known 
function evaluations {hi :— h(xi)}, and a basis e^ = Pi{x) 
where pi (x) is a degree i < m—1 polynomial, find an ap- 
proximation (the interpolant) 



Im[h]{x) = 2jciPj(a;) « h{x) 



such that 



^m[h]{xi) = hi for i = 1, 



(14) 



(15) 



That is, the approximant is required to agree with the 
function at the set of m nodes. 

We can show that the problem defined by Eqs. ( 14|15 ) 
has a unique solution in terms of Lagrange polynomi- 
als. Given a convergence rate for the projection-based 
approximation Eq. ( 13 ) we might wonder how much ac 

Hi 



curacy is lost by trading it for the interpolation Eq. 
and how to optimally choose the node points Xi. When 
the relevant error measurement is the maximum point- 
wise error, Chebyshev nodes are known to be near- 
optimal, bringing an additional error which grows like 
log(m) |48l|49]. 

For application-specific bases, a good set of interpola- 
tion points is not known a-priori. Next we describe an 
approach for identifying a nearly-optimal set. 



B. Empirical interpolation ^vith RB 

The Empirical Interpolation Method was proposed in 
2004 [2S] as a way of identifying a good set of interpola- 
tion points for arbitrary basis sets on multi-dimensional 
unstructured meshes and has since found numerous ap- 
plications [371 [SUHSS]- Recently, the EIM was shown 
to dramatically speed up parameterized inner product 
(overlap) computations in the absence of noise [52] . For 
definiteness we will focus on the frequency-domain case. 
In general, a well-posed interpolation problem for m basis 
functions requires m interpolation points {i^i}i!^i- Addi- 
tionally, these points must ensure an accurate approxi- 
mation. Crucially, the EIM algorithm selects the interpo- 
lation points as a subset of the full N /2 + 1 data samples 

(this choice is motivated in Sec. V A), {i^;}™ i C {fiJiJo , 
and m < N/2 or even m <C N/2. 

With ROM we seek to find an empirical (that is, 
problem-dependent) global interpolant 



I„[;i](/;A):=^c,(A)e,(/) 



(16) 



where the Ci coefficients are defined as solutions to the 
interpolation problem 

Irn[h]iFk:\) = h{Fk;\), Vfc = l,...,m. (17) 

For the moment, we shall assume that the EIM points 
are known (the precise way of finding them is explained 
in Appendix [B|) and proceed to describe how we use them 
to find the EIM interpolant. Equation (17) is equivalent 



to solving an m-hy-m system Ac = h for the coefficients 
c, where 



A-.'- 



( ei(Fi) 62(^1) 

ei(i^2) 62(^2) 

ei(i^3) 62(^3) 

Vei(F„) e2(F,„) 



e™(i^i) \ 

e™(i^2) 

e™(i^3) 

^m\^ •nx) / 



(18) 



The EIM algorithm ensures that the matrix A is invert- 
ible, with c = A~^h the unique solution to Eq. (17). As 



A is parameter independent we have, for all values of A, 



Z„[;i](/;A)-e^(/) A-^h{\) 



(19) 



where e ^ — [ei(/), . . . , e„i(/)] denotes the transpose of 
the basis vectors, which we continue to view as functions. 
The empirical interpolant is nearly optimal in the sense 
that it satisfies 

max \\h{--\) - Z,^\h{-- A)]||2 < A^a™ , (20) 



where 0^ characterizes the representation error of the ba- 
sis as defined in Eq. (pi) and A™ is a computable Lebesgue 



constant. For more details and in the context of GWs, 
see, for example, |22j . For problems with smooth depen- 
dence with respect to parameter variation we can expect 
exponential decay of am with respect to ni and therefore 
of the EIM error ([20| as well. 



C. An example of EIM: burst w^aveforms 

We now provide a qualitative outline of the EIM algo- 
rithm, with more details given in Appendix [B| As input 
the algorithm takes the basis set {ei}™]^ and an arbi- 
trary number and choice of data samples {/iljj^g from 
which the empirical interpolation points {i^i}™ ^ are to 
be selected. The EIM algorithm proceeds as follows 

1. The first point is chosen to maximize the value of 



|ei(/i)|; that is, 
pies. 



leil-F^i)! > |ei(/i)| for all data sam- 



Next, an empirical interpolant for the second basis 
function is built using only the first basis function: 
From Eqs. ( |16|17 ) or, equivalently, Eq. ( 19 1 we have 
^i[e2]{f)] = ciei(/) where a = e2(Fi)/ei(Fi) has 
been found from Eq. (171 with k = 1. 



3. The second empirical interpolation point is cho- 
sen to maximize the value of the pointwise in- 
terpolation error of Ii[e2]{f) — e2(/); that is, 
|2:iN(F2)- 62(^2)1 > |IiN(/.)-e2(/.)| for all 
data samples. 

4. Steps 2 and 3 are then repeated to select the re- 
maining TO — 2 points. 

As described, the EIM follows a greedy approach, al- 
beit somewhat different from that one we used to build 
a Reduced Basis. While a greedy algorithm to build a 
RB selects the most relevant points in parameter space, 
the EIM selects the most relevant points in the physical 
dimension(s). 

Fig. [3] provides a graphical illustration of the EIM al- 
gorithm's first iterations for the family of sine-Gaussian 
burst waveforms ^, using the RB described in Sec. |IIIB| 
All 771 = 54 point selected by the greedy algorithm (see 
Sec. Ill B ) are shown in Fig. El Finally, in Fig. [5] we 
show the largest empirical interpolation error of 10, 000 
waveforms drawn randomly from the parameter region 

[Eq. m]. 



V. REDUCED ORDER QUADRATURES 

As anticipated and summarized in Sec. [Tlj building an 
ROQ has offline and startup costs, with the advantage 
of very fast online evaluations. In the offline stage we 
construct the basis and EIM points. This stage is inde- 
pendent of any data/ signal. The startup stage, in turn, is 
data-dependent and completes the ROQ, which preserves 
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FIG. 3: Iterations 1 (top) and 2 (bottom) of the EIM al- 
gorithm. The first EIM point is defined by the location of 
max(|ei|). To identify the second point we: i) build the em- 
pirical interpolant X\\e-2\ of 62 using e\ and the sample point 
F\ (cf Eq. ( |19[ |), ii) compute the pointwise error Ii[e2] — 62; 
iii) the second EIM point is then defined by the location of 
max(|Ii[e2] — 62!). The process continues until all m empiri- 
cal interpolation points are found. 



the accuracy of any quadrature rule of interest with a 
number of quadrature nodes which equals the number of 
basis functions. Roughly speaking, the accuracy of the 
resulting ROQ is comparable to that of the basis, with 
the nodes chosen as a subset of the data points at which 
the signal has been sampled. 

The details of how to construct an ROQ rule mimic 
well known quadratures rules. Let us briefly recall how 
these standard quadratures are derived for the integra- 
tion of a real function h(x): the function is approximated 
by its polynomial interpolant (cf. Sec. |IVp and the latter 
integrated exactly to compute the weights of the rule. 
Namely, given the interpolation approximation 



h{x) sa ^/i(xi)^i(x) , 



i=l 



where iiix) are Lagrange polynomials (see Sec. IV A I, 
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FIG. 4: Empirical interpolation points (red asterisks) selected 
by the EIM algorithm for the sine-Gaussian waveforms. These 
points are a subset of the original data (which in this case has 
equidistant spacing A/, see Sec. |V A[ ) and cluster towards 
lower / ~ IHz, as expected. Four representative waveforms 
are depicted for all possible combinations of max/min values 
of the waveform frequency /o and width a. Greater diversity 
in waveform features is evident at lower frequencies. 
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FIG. 5: Approximation error as a function of the number of 
Reduced Basis (RB) generated with a greedy algorithm (solid 
blue), and for the Empirical Interpolant (dashed black), de- 
fined as am and max^ \\h — Im[h]\\'^, respectively. The dashed 
red line shows the error bound [see Eq. ( 20 1] . 



standard quadratures are derived as 

/m „ 

h{x)dx w ^ h{xi)ai a., := / ti{x) . 



i=l 



to the trapezoidal rule, for m = 2 to Simpson's rule, etc. 
By additionally choosing the location of the interpolation 
points we can maximize the exactness of the quadrature 
rule for polynomials, leading to Gaussian quadratures. 



A. Riemann sum with uniform sampling 

In general, the output of a GW detector is comprised 
of data segments of duration T, which are uniformly sam- 
pled every At seconds. Assuming for simplicity tc = 
for the time being (how to include the arrival time is dis- 
cussed in Sec. VCl, for iV = T/{At) data samples the 



discrete GW waveform 

/i(jAi;A), j=O...N, 

has discrete FT h(fi] A), which is known at the frequency 
points {f^}fJ^_ = {0, /o, 2/o, . . . , (iV/2) /o}, where /„ = 
1/T = (NAt) = A/ is the fundamental frequency and 

/max = {N/2)fo. 

Due to the fact that the data taking procedure dictates 
the instants of time at which the (non-smooth and noisy) 
signal is known, an obvious numerical approximation to 
Eq. (l4| is a low order discrete Riemann sum. 



{a\b) 



(alb) 



NA 



N/2 
i=0 



-aim if i. 

Snif^) 



(21) 



Thus, the computational cost of Eq. ( |21[ ) depends on N, 
which in turn depends on the data sampling rate. 

Whether performing searches or parameter estimation 
studies, the numerical integral Eq. (21) is repeatedly 



evaluated for a variety of GW templates h{f,\). Next 
we show how such integrals can be computed with a cost 
not set by the Nyquist sampling rate, but rather the 
"information content" of the GW templates themselves; 
namely, the number of basis functions, m. This is similar 
in spirit to the fact that compressed sensing can "beat" 
Nyquist- Shannon sampling criteria |54j . 



B. Building the ROQ 

Consider a discrete approximation {■\-)d to the con- 
tinuum scalar product of Eq. Q. The Riemann sum 
Eq. (21 1 is a natural choice in data analysis studies. 



Interpolation at equally spaced points for m = 1 leads 



whether for Bayesian parameter estimation or searches 
with matched filtering. Given the discrete FT of a data 
set s{fi) (one can similarly build an ROQ in the time 
domain), and specializing to white noise Sn = ^ without 
loss of generality (one can absorb Sn into the definition 
of s), ROQ inner products between data and templates 
h{f; A) are computed as 



N/2 

(MA)|s)d = 43?^s*(/fe)/i(/fe;A)A/ 

fe=0 

N/2 N/2 

fc=0 A;=0 

'n/2 

h{X)^mY,ujkh{Fk;\) 

k=l 



43? 



E^*(^)e^''(^)A/A-i 



A;=0 



= : (ft.(A)|s)noQ, 



r 



where the coefficients ujj are given by: 

N/2 



■.= Y.s*{h)e,{fk)AfA-' 



(22) 



fe=0 



The vector 



N/2 
k=0 

is composed of inner products between all the basis el- 



ements and the data. We refer to {uJk}kLi Eq. (22) as 
data-specific weights, and their generation comprises the 
ROQ startup cost. Defining the scalar product between 
the data and the j*'' basis function by 



N/2 



^. :=E^*(^)^J-(^)^/' 



fc=0 



the data-specific weights are given by 



(23) 



(24) 



Notice that the ROQ nodes are exactly the EIM points 



which, together with the weights (22), completes our 
ROQ approximation 



{h{\)\s}„oQ^mJ2^kh{Fk;X) 



(25) 



fe=i 



The ROQ rule's accuracy only depends on the inter- 
polant's accuracy to represent h(f; A) and the accuracy of 
the original quadrature (-I-)!!- In particular, the method 
does not assume s to be well approximated by the basis 
(i.e. neither waveform modeling assumptions nor details 
about the noise realization are important). Since, as dis- 



cussed, the error of the interpolant, Eq.(20) can be ex- 



pected to decay exponentially for the cases of interest, in 
practice the ROQ replaces the original quadrature rule by 
a less expensive one with the same accuracy (within, say, 
machine precision) . How much smaller m is compared to 
N/2 is model-dependent; in Sec. VI we quantify this for 
the family of burst waveforms described in Eq. ([9]) . 



Figure [6] shows the nodes chosen by the EIM in the fre- 
quency domain, and the ROQ weights (24) for the burst 
waveforms (|9]). Figure^ in turn, shows the interpolation 
error 



\{hiX)\s) 



{h{-X)U 



(26) 



which arises in the computation of the overlap, in both 
cases with and without noise. Here the max errors label 
on the vertical axis refers to the maximum error found in 



a thorough sampling of the parameter range ( 10 ) , and the 



error is relative to a standard Riemann-sum integration 
(pTl) with 1025 points. 



r 



C. Extrinsic parameters 

So far we have described how to build ROQs over the 
intrinsic parameters characterizing the waveform signal. 
The extrinsic parameters include the arrival time of the 
signal tc |57j . the phase of the waveform at this time. 



and parameters such as the sky position, orientation and 
distance to the source. The phase of the waveform af- 
fects the model simply as multiplication by a complex 
constant, which keeps the waveform in the RB space. 
Similarly, sky position, orientation and distance just af- 
fect the amplitude of the source and the projection of 
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FIG. 6: Real (red squares) and imaginary (blue diamonds) ROQ weights computed from Eq. (24 1 for the test burst family 
waveforms in the range given by Eq. (10 1 and the injected signal with default parameters (121. The top figure is for the 



noise-free case when s{t) — h(t) while the bottom figure shows weights when s{t) — h{t) +n{t), where n{t) is a particular noise 
realization. 



the plus and cross polarizations of the waveform into a 
detector response and also do not take the waveform out 
of the RB space. However, the arrival time t^ requires 
some more discussion. 

If we denote by A the set of parameters excluding tc 
and by /io(^; A) the waveform computed with t^ = 0, then 



h{t;tc,X) = ho(t- tc\\) ■ 



(27) 



with FT given by h{f; tc. A) [ see Eq. (^]. For parameter 
estimation we compute integrals of the form 



0(tc,A):= 



Snif) 



df. 



(28) 



The simple dependence of the FT is exploited in GW 
searches by defining the function /o(/; A) via 



/o(/;A) = 



feo(/;A)g*(/) 

Snif) 



(29) 



for which 



0(t„ A) = / ioif; A)e2-/*=d/ - 27rIo{-t,; A) , (30) 
Jo 

where /o(i; A) is the inverse FT of io{f; A). Since FFTs 
are efficient, we can search over tc cheaply by doing this 
inverse FT. 



The ROQ rule that we have computed for waveforms 
/io(i;A) enables us to compute the integral of /o(/;A) 
cheaply. However, we now need to compute the integral 
of Ioif', A) exp(27ri/ic) and so the existing ROQ rule is in 
principle not guaranteed to work. However, if the ROQ is 
being used for follow-up parameter estimation, this will 
normally be triggered by the detection of a candidate 
event in the data stream of one or more detectors. These 
triggers will normally be able to localize the event to 
within a time interval comparable to a couple of cycles 
of the signal. 

In practice, the simplest approach to handling t^ is to 
build an ROQ rule for an estimated value (which we can 
denote by tc = without loss of generality) and use it for 
other arrival times within a reasonable window around 
that value. In this way we can include the arrival time 
information at no extra cost. We show the error that 
arises from using a ROQ built for tc — for non-zero 
values of tc in Fig. [8) If higher accuracy is desired, we 
can build an ROQ which includes tc within the parameter 
space without losing efficiency, since it is an offline com- 
putation. Due to the fact that an estimate of the prior of 
the tc is known, and typically small, we have found that, 
as it was expected, the number of basis (and therefore 
ROQ nodes) increases by a small amount. Alternatively, 
we can build ROQ weights tJ (tc) for different values of 
tc from Eq. (22), increasing the startup cost, and inter- 
polate uJi{tc) in tc- We have found that these coefficients 
have a weak dependence on tc making them simple to 
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FIG. 7: Integration error (26 1 versus number of ROQ nodal 



points (m) for 10, 000 randomly selected values of A. The solid 
black curve depicts the noise-free case s = h and the last data 
point (m = 54, am « 3 x 10~*) corresponds to the rule used 
in Fig. [S] The blue and red curves show the maximum and 
minimum, over 100 realizations of pure noise data, s = n, of 
the error maximised over all parameter values A. Note that 
the ROQ shows exponential convergence with respect to the 
number of ROQ nodes, even for pure notse data. 
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FIG. 8: Errors in computing the correlation between the data 
stream s and the model waveform h (see Sec|ll| (s|/i(A,tc)) 
using an ROQ rule built for fc = with accuracy better than 
~ 10~®. Empirically we find that this rule continues to work 
well for non-zero values of tc- Looking ahead to Sec. |VI| we 
anticipate evaluating the likelihood function for t^ < 0.5 sec. 



an 0{m?) cost, which is larger than the ROQ count of 
0{m). However, in many applications of interest the 
waveforms themselves are very expensive to compute and 
so this cost wiU still be much smaller than the full likeli- 
hood evaluation. 



D. Computing the likelihood 



In order to evaluate the likelihood we compute Eq. (pi) 



as 



3|s) + (/i(A)|/i(A))-23?(s|/i(A)) 



(31) 



where the last term is handled with the ROQ rule (25) 
and the first term needs to be computed once. In the 
case that the data stream s{t) contains a sine-Gaussian 
burst- waveform (It]) and white noise n{t) (see Seclll]), we 
can compute a closed-form expression for the norm. 



{h{\)\h{\)) ^AA^a^/^(l 



'i-K'f^a' 



(32) 



where /min = and /max = oo have been assumed. 
When closed-form expressions are unavailable we have 
a few options. One possibility is to build an ROQ rule 
for the norm, which requires additional offline computa- 
tions. Here we consider an alternative. Notice that the 
norm 



{h{\)\h{\))^Y.^j 



(33) 



is expressible in terms of the EIM coefficients c = 
A~^h. Explicit computation of these coefficients carries 



E. ROQ cost and efficiency 

Here we comment on ROQ offiine and startup costs as 
well as the expected speedup for likelilood evaluations. 

To find m basis functions we use the greedy algorithm 
described in Appendix IX] The asymptotic cost of this 
algorithm applied to a training set with M elements is 
O {NMm)^E\. Furthermore, the algorithm is trivially 
parallelized making large M problems accessible. Once 
the basis is built, an EIM algorithm is used to identify 
the ROQ points. As described in Appendix [B| the cost 
of the EIM is dominated by inversion of a full matrix; in 
particular the matrix defined in Eq. ( 18 1 for the first i 
basis/points ( see algorithm 2 in Ref. ^22 for a equivalent 
algorithm which utilizes a lower triangular matrix). The 
asymptotic cost of Alg. [2] and its modified equivalent are 
O (to"^ -I- Nm'^) and O (m'^ + Nm'^) respectively. 

When considering startup costs, we note that the ma- 
trix A is data-independent and can be inverted offline. To 
compute ROQ weights first i) m inner products between 



is performed. 

,2\ 



the data and all basis are computed from Eq. ( |23[ ) and 
finally ii) the matrix-vector product ([24]) 
Whence the overall startup cost is O (jnN + m" ^ 

We now compare the cost of full and compressed 
(ROQ) overlap evaluations, respectively Eq. (21) and 
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Eq. (25). Computational costs stem from evaluating 
h{fi,A)a,s well as performing the multiplications/sums. 
When the waveforms are known through closed-form, 
frequency-domain expressions we expect a speedup factor 
of approximately N/(2m). For closed-form, time-domain 
expressions the savings will be even greater if ROQ rule is 
constructed for Eq. ( plj ) while the ElM interpolant (and 
hence selected ROQ points) is built in the time-domain. 
If the waveforms are found by solving ordinary differ- 
ential equations the speedup is less straightforward to 
estimate. For example, adaptive time stepping schemes, 
such as the Runge-Kutta-Fehlberg method, permit large 
step sizes set by an error threshold (rather than equally 
spaced samples set by A/). Thus, while one should ex- 
pect fewer ODE steps to evaluate for m (as opposed to 
iV) points, the savings would be problem dependent. 



VI. RESULTS 



SNR Method 



Recovered Values 
/o a 



5 


Full 


0.189 ±0.095 


0.831 ±0.194 


ROQ 


0.189 ±0.095 


0.831 ±0.194 


10 


Full 


0.172 ±0.081 


0.803 ±0.136 


ROQ 


0.172 ±0.081 


0.803 ±0.136 


20 


Full 


0.168 ±0.075 


0.800 ±0.108 


ROQ 


0.168 ±0.075 


0.800 ±0.108 



40 



Full 
ROQ 



0.212 ±0.051 
0.212 ±0.051 



0.872 ±0.091 
0.872 ±0.091 



TABLE I: Parameter values recovered, for the waveform fre- 
quency /o and width a, using both the full and ROQ likeli- 
hoods. Values quoted are the mean and standard deviation 
estimated from the posterior for a particular noise realisa- 
tion. The same noise realisation is used for the full and ROQ 
likelihood calculations for each SNR. 



An MCMC algorithm aims to find a chain of N^cmc 
samples, {x^}, that are distributed according to the tar- 
get probability distribution, pt(xi), such that integrals 
over the probability distribution can be approximated 
by sums over the points in the chain 



iV„ 



Pt(x)/(x)dxw ^ /(x, 



(34) 



which the RB and ROQ were built, given by Eq. (10 1, 
and priors for the other parameters of tc € [—2,2], and 
A G [0.1, 10]. In order to compare the cost and accuracy 
of the standard, or full MCMC computation vs the ROQ 
one, we repeat the analysis using the same data, number 
of MCMC points, proposal distribution and priors, but 
changing from the standard, full likelihood to the ROQ 
one. The results are presented in the following sections. 



The chain of points can be obtained using the Metropolis- 
Hastings algorithm j55j . The first point, xi, is chosen at 
random from the prior. At iteration i a new point y^ 
is drawn from a proposal distribution q{yi\xi) and the 
Metropolis-Hastings ratio, r, evaluated 



Pt(y»)g(x»|yO 
Pt(xj)g(yj|xj)' 



(35) 



A random number u G [/[0, 1] is drawn and if u < r 
the move is accepted, x^+i = y^; otherwise the move is 
rejected and Xj+i = x^. 

In our case, the target distribution is the posterior 
probability distribution given by Eq. M , which depends 
on the likelihood and can therefore be approximated us- 
ing ROQs. To illustrate the method, we will consider the 
problem of recovering the parameters of a burst signal of 
the form given in Eq. nh from a noisy data stream. 

We include Gaussian white noise with unit power spec- 
tral density, S'„(/) = 1, and take the parameters of the 
true signal to be our default ones, Eq. (12 1. We assume 



that the observation is 32 sec long and the data is sam- 
pled at 64Hz. We use a symmetric Gaussian proposal 
distribution 



g(y,|x,) ex cxp -T^kixj - yi){x'i - yf)/2 



where Fj^. = {djh\dkh) is the Fisher information ma- 
trix. We use priors on /o and a that span the range over 



A. Two parameter search 

As a first test we restrict the search to two parame- 
ters — {fo,Oi} — while fixing tc and A to the injected 
values. In Table |T] we compare the parameter values re- 
covered using the full data set and Riemann sums with 
those recovered from ROQ likelihoods in one particular 
noise realization for each of four different SNRs of the 
injected source. The values are quoted as /Ui ± ai, where 
the one dimensional marginalised posterior mean, iii, and 
standard deviation, Ui, in parameter i are defined from 
the set of MCMC samples {xj} by 



1 



iV« 



iV„, 



E 



1 



-^ ''mcmc 






(36) 

In all cases the statistics of the posterior distribution 
are completely consistent between the full likelihood and 
ROQ likelihood computations. The only differences are 
beyond the significant digits quoted in the Table and are 
much smaller than the corresponding uncertainty in the 
parameter values arising from noise in the data stream. 
The ROQ likelihood is extremely accurate, with differ- 
ences of 10"^ or smaller, so it is not surprising that the 
statistical results are indistinguishable. 

We can also ask whether the full posterior distributions 
are consistent between the two likelihoods. This can be 
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achieved by using a Kolmogorov-Smirnov (KS) test [IS] 
to compare the ID and 2D marginalised posteriors ob- 
tained using the two different hkehhoods. Figure |9] shows 
the ID marginahsed posteriors for /o and a computed us- 
ing the two hkehhoods. These are indistinguishable by 
eye and, more precisely, the p-value of the KS test that 



the distributions agree are 1.0 (full and ROQ likelihood 
evaluations agree to within 11 digits) for both /o and a, 
so there is no evidence of any difference in the recovered 
posteriors. Again, this is to be expected because of the 
high accuracy of the ROQ likelihood. 
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FIG. 9: Marginalised cumulative probability distributions for /o (left panel) and a (right panel) for a true source with SNR 
p — 5. Each panel contains two curves which lie on top of each other, one computed using the Full likelihood and one using 
the ROQ likelihood. A KS test confirms that the two distributions are the same with probability 1.0 (full and ROQ likelihood 
evaluations agree to within 11 digits). 



r 



B. Four parameter search 

We now consider a search over the full four dimen- 
sional parameter space {fo,a,tc,A}. The ID and 2D 
marginalised posteriors for a typical noise realisation 
computed using both the full and ROQ likelihoods are 
shown in Fig.jlOJ while Table [TT] lists the posterior means 
and standard deviations found in a particular noise re- 
alisation using both techniques for a variety of SNRs 
of the true source. As in the two parameter case, we 
find that the statistics derived from the posterior dis- 
tributions (e.g., the mean, standard deviation, quantiles. 



etc.) are completely consistent between the full and ROQ 
likelihoods and, more precisely, p- values of the marginal- 
ized distributions are ~ 0.25 - 0.75 for 10^ point MCMC 
chains. The KS statistic, which measures the maximum 
difference in the full and ROQ cumulative probability 
distributions, computed from the margi nalized posteriors 
were ^ 10^^. As described in Sec. IVCl these small differ- 
ences stem from applying an ROQ rule built for t^ ^ 
to non-zero values of tc (see Fig. [s]) . While the resulting 
errors are smaller than the typical width of the posterior, 
if higher accuracy is desired the alternative approaches 
discussed in Section. IV CI can be used. 



r 



Having established the equivalence of the results for 
the full and ROQ likelihoods, we can now compare the 
run time. The ROQ likelihood has a higher initial cost. 



since the data-specific weights ( 24 ) have to be computed 
prior to beginning the MCMC. In general this start- 
up cost is a tiny fraction of the total run time of the 
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Full Likelihood 




FIG. 10: PDFs obtained for an injected source witii SNR p = 10, employing standard and ROQ MCMC computations in a 
four-parameter space, namely A, tc, /o and a. The figures qualitatively show the agreement between the two techniques, see 
Sec. lVIBl for more details. 



SNR Method 



Recovered values 



A 



5 


Full 


0.217 ±0.069 


0.896 ±0.194 


0.068 ±0.104 


1.704 ±0.379 


ROQ 


0.217 ±0.068 


0.897 ±0.196 


0.069 ±0.104 


1.702 ±0.375 


10 


Full 


0.212 ±0.048 


0.875 ±0.132 


0.084 ±0.053 


2.362 ±0.278 


ROQ 


0.209 ±0.050 


0.866 ±0.132 


0.085 ±0.052 


2.387 ±0.287 


20 


Full 


0.225 ±0.029 


0.891 ±0.093 


0.092 ±0.028 


2.944 ±0.176 


ROQ 


0.224 ±0.029 


0.892 ±0.093 


0.093 ±0.028 


2.944 ±0.177 



40 



Full 
ROQ 



0.248 ±0.009 
0.248 ±0.009 



0.981 ±0.041 
0.981 ±0.042 



0.097 ±0.016 
0.097 ±0.016 



3.471 ±0.157 
3.471 ±0.157 



TABLE II; As Table IT] but for searches over the full set of four parameters: waveform frequency /o and width a, coalescence 
time tc and amplitude A. The parameter valuesare recovered using the full and ROQ likelihoods. Values quoted are the mean 
and standard deviation estimated from the posterior for a particular noise realisation. The same noise realisation is used for 
the full and ROQ likelihood calculations for each SNR 



MCMC algorithm [55]. For the burst waveforms used in 
this paper, the total time taken to compute the weights 
is ~ 10ms, which is comparable with ^ 85 MCMC 
chains using the full likelihood. By comparison a re- 
solved MCMC simulation, for example the one leading 
to table HD requires ~ 5 x 10^ MCMC chains. Evidently, 
for this problem, the start-up time is a negligible fraction 
0.01% of overall cost for a resolved MCMC simulation us- 
ing the full likelihood. In light of the scalings described 
in Sec. |VE| we expect negligible start-up costs whenever 
m < N/2. 



In Fig. [TT] we show the time taken to run the MCMC 
search, i.e., after the initial set-up time, using the full and 
ROQ likelihoods. As we can see the ROQ is two orders 
of magnitude faster that the full likelihood computation. 
Figure [12] shows the ratio of the runtimes for the ROQ 
and full searches. The speed-up is seen to be ~ 25, which 



is expected in light of the scalings given in Sec. VE 



The cost of the MCMC search grows linearly with the 
number of MCMC points, as we would expect, since the 
run-time is determined primarily by the cost of likelihood 
evaluations. The speed-up from using the ROQ is, in this 
case, a factor of ~ 25. This factor will of course be prob- 
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lem and implementation-dependent, but it is roughly the 
ratio between the total number of frequency samples iV/2 
and the number of ROQ subsamples m. This ratio will 
depend on various aspects of the problem — the sampling 
cadence, total observation time, the allowed range for the 
parameters, and the waveform model itself. For example, 
if we know in advance the frequency and duration of the 
burst then carefully choosing a sampling rate and obser- 
vation time just large enough for the source used in this 
paper reduces the speed-up to ~ 10. Such tuning of the 
cadence and observation time is effectively a compression 
of the likelihood, and is very effective for a simple model 
of this type. The fact that even after such tuning the 
ROQ rule can show a significant speed-up illustrates the 
power of the method. In other problems, speed-up fac- 
tors of 10-100 are typical and factors of 1000 are possible, 
but these have to be computed on a case by case basis 
and will be reported elsewhere. An investigation of the 
speed-ups for inspiral waveforms is currently underway. 
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FIG. 12: Time ratio (speed-up) of MCMC simulations using a 
standard quadrature rule and the ROQ one. The figure shows 
the mean of the speed-up obtained by performing simulations 
with different seed values for the MCMC. See the Sec. IVIBI 
for details. 
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FIG. 11: Runtime as a function of MCMC chains for A^ „,„„ = 
lO'' samples. The red (dotted) line shows the timing for stan- 
dard MCMC computations and the blue (dashed) line the 
timing for the ROQ computations. 



VII. SUMMARY 

In this paper we have proposed using a modification 
of the Reduced Order Quadratures (ROQ) of Ref. [H] 
for fast, accurate evaluations of the correlation between 
a given data stream and a family of gravitational wave- 
forms. The modification is designed for Markov chain 
Monte Carlo (MCMC) parameter estimation studies and 
as such, it is adapted to a particular stream of (noisy) 
data. The resulting speed-up is not at the expense of re- 
duced accuracy but, instead. Reduced Order Modeling is 
used to build application and data-specific quadratures 
for the problem at hand. 



The ROQ rule requires an offline computation to build 
a waveform basis and identify a distribution of sparse 
data samples. This application-specific information can 
be stored to file and reused for any stream of data. Then, 
for a given data set we compute data-specific weights us- 
ing Eq. ( 24 ) ; the overall cost of this computation is neg- 
ligible. Fast and accurate compressed likelihood com- 
putations are then performed with Eq. ( 25 ) , which can 



be implemented within existing MCMC codes in a non- 
intrusive manner. 

For the particular application considered here as an il- 
lustration of the concept, models of burst gravitational 
waves, we have found speedups of ~ x25, depending on 
settings such as the central frequency of the wave, the 
damping factor, observation period, and sampling rate. 
These speedups are expected to increase with the com- 
plexity and fidelity of the model. 

In Ref. HH it was found that the number of Reduced 
Basis waveforms needed to represent the space of inspiral 
waveforms in the post-Newtonian stationary phase ap- 
proximation barely increases when (non-precessing) spins 
are taken into account. Since ROQ by design uses the 
same number of nodal points as the number of basis func- 
tions needed to represent the space of waveforms within a 
given accuracy, the approach holds the promise of beating 
the curse of dimensionality. There is also evidence that 
the case of precessing binaries is amenable to dimensional 
reduction [56]. 
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Appendix A: Reduced Basis 



In its simplest form, such as when the waveforms are 
inexpensive to compute, the greedy algorithm for build- 
ing a RBs has as input a set of parameter values 



T:^{K} 



M 



(Al) 



usually called training points, and associated waveforms 
{/i(-; \i)}fLi, usually called the training set. 

Part of the output is a hierarchical set of parameter 
values {Al, A2, • • • , \m] Q T (with m < M, and m < M 
or even m <^ M ii the problem is amenable to dimen- 
sional reduction) called the greedy points, and associated 
waveforms, which constitute the RBs, 

RB:={ei(-):=M-,Ai),--- , e„(-) := /i(-, A„0} . (A2) 

The RB serves as a representation of the waveforms in 
the training set and, if the latter is dense enough, of the 
whole continuum. The optimal representation by a ba- 
sis is known to be the orthogonal projection Vm onto its 
span. This result is a standard linear algebra one, inde- 
pendent of Reduced Basis or Reduced Order Modeling. 
That is, the approximation 



/i(-;A)«^c,(A)e,(.) 
1=1 

minimizes the error, 

m 

M-;A)-5^c,(A)e,(-) 

i=l 

when the coefficients c, are chosen such that 



(A3) 



h{■;\)-J2c^We^{■ 



4=1 



e,(-) 



V e, e RB . 



(A4) 



The solution to ( A4 ) is 



(A5) 



c,(A) = ^(G-i).,(M-;A)|e,(.)), 

where G^^ is the inverse of the Grammian or Gram 
trix G, with entries 



G, 



If the basis is orthonormal, this matrix is the identity 
and one recovers the familiar expression 



i-l 



{h\ei)( 



In general the RB waveforms selected by the greedy al- 
gorithm will not be orthonormal. Then at each greedy 
iteration one can use a Gram-Schmidt (GS) procedure 
to orthonormalize the RB or, equivalently, simply invert 
the Gram matrix. In either case, for any given basis, the 



optimal approximation of the form ( A3 ) is given by 



/i(.;A)«7'™/i(-;A):-^cKA)e,(- 



(A6) 



i=l 



with the coefficients Ci given by Eq. (A5). Notice that 



since the approximant ( A6 ) is defined in a completely 



geometric way, as the orthogonal projection onto the span 
of the RB elements, it is independent of whether a GS 
procedure is carried out or not. The RB (A2), at the 



same time, is composed of a set of the "most relevant" 
physical waveforms. 



The precise algorithm to choose the greedy points is 
described in Alg. [T] Given an arbitrary user-defined tol- 
erance error e, the algorithm stops when the approxima- 



tion (A6) meets the tolerance. 



\\hi-,X)-rrahi-;X)f<e V A e T. 

In all expressions the scalar product (-I-) and its asso- 
ciated norm might be weighted. In the context of GW 
physics a natural choice is that one given by Eq. Q , but 
any other choice is possible. 
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Algorithm 1 Brief description of the Greedy Algorithm 
1: Input: {A, ,/i(-;A,)}f£i, e 



Seed choice (arbitrary): Ai 
RB = {/i(-;Ai)} 
i = 1 and ai ~ 1 
while CTj > e do 

i^i + 1 

ai =max^^^\\h{-;\) - P(,_i)/i(-; A)||2 
Ai = argmax_5^g^||ft(-;A) - 'P(.,_i)/i(-; A) 
RB = RB U/i(-,A,) 
end while 



11: Output: RB and greedy points 



Appendix B: The Empirical Interpolation Method 

The EIM approach is very different, in goals and scope, 
to any variation of standard polynomial interpolation. 



which was described for completeness in Sec. IV A The 
goal of EIM is to deal with parametrized problems char- 
acterized by non-polynomial bases. The set of EIM 
points is nested and hierarchical, as one would want when 
solving differential equations, and easily handles unstruc- 
tured meshes in several dimensions. 

Consider a basis {eiix)}"!^-^ whose span accurately ap- 
proximates the functions h{x;\). For definiteness we 
will denote by x the physical dimension(s) and A the 
parametrization of these functions. For example, if /i is a 
GW then x could denote time or frequency, and A the in- 
trinsic or extrinsic parameters of the system. Let {xi}^^ 
denote a set of N points and define the corresponding N- 
vector X = (xi, X2, . . . , xn) ■ Discrete objects arise from 
evaluating continuous functions at x. For example, defin- 
ing hi{\) = h{xi\ A), the GW A- vector is h{\) = h{x; A). 
Similarly, e^ — ei{x) denotes the i*'' basis function eval- 
uated at X. 

Given an input of m evaluated basis functions {6^}™ j^ 
the output of the EIM algorithm is a set of m EIM points 



{AJ™ 1 C {x,} 



N 



(Bl) 



selected as a subset of {xi}^^^. If a function h{x;\) is 
known at the EIM points {Ai}™ j, the EIM interpolant 
can predict with high accuracy the function at any other 
value of {xi]f^i. It is an interpolant in the usual sense, 
meaning that it agrees with the interpolated function at 
the interpolation points. 



Z„J/i](A„A) = MA„A) for 1 = 1, 



. ,TO . 



The EIM interpolant is given by Eq. (19), while the se- 
lection of the EIM points is described inAlgorithm[2] To 
assist with the description of the EIM algorithm we de- 
fine the j-term empirical interpolant built from the first 
j basis functions and points 



where the c; coefficients are solutions to the j-point in- 
terpolation problem 

I,[h]{Xk;\) = h(Xk;\), Vfc = l,...,j. (B3) 



Algorithm 2 Selection of EIM Points 



N 



1: Input: Evaluated basis {ci}™ i and points {x} 

2: i = argmaxjeil Comment: here argmax takes a vec- 
tor and returns the index of its largest entry. 

3: Set Ai = Xi 

4: for j = 2 — > m do 

5: Find Ij_i[ej](a;) 

6: Compute the point-wise error r — Ij_i [ej] (x) — Cj 
7: i = argmax|f^ 

8: Set Aj = Xj 

9: end for 

10: Output: EIM points {AJ™ ^ 



Comments 

1. In standard polynomial interpolation the interpolant 
is a linear combination of polynomials and function val- 
ues, as in Eq. (14). In the EIM the interpolant is a linear 



combination of (in the case of interest for this paper), 
waveforms and function values in the physical dimen- 
sion(s), as given more precisely by Eq. ( 16 1. Parametriza- 
tion and "physical" dimensions play a dual role. 

2. Unlike Gaussian (e.g., Chebyshev) interpolation 
nodes, EIM nodes are nested and hierarchical. Given 
a hierarchical basis 

{ei(x)} C {ei(x), e2(x)} C . . . C {e,(x)}^i 

an associated set of EIM points 

{Ai}c{Ai,A2}c...c{A,}^i 

is defined. Each set of p EIM nodes is included within the 
set of p' EIM nodes whenever p < p' and only depends 
on the basis of dimension p. 

3. The empirical interpolant satisfies 

max \\h{-- A) - Ira[h{-., A)]||' < Kl^Gm , 

A 

where Um characterizes the representation error of the ba- 
sis as defined in Eq. ^ and A„i is a computable Lebesgue 
constant (see Theorem 2 of Ref. [22]). Furthermore, due 
to the slow growth of A^ , often comparable to the best 
possible scaling [37], the interpolant is said to be nearly 
optimal. 



Ij[h]{x;\) :== 'Vci(A)ei(x) 



(B2) 
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